Skip to content

Conversation

iclsrc
Copy link
Collaborator

@iclsrc iclsrc commented Sep 27, 2025

pabloantoniom and others added 30 commits September 10, 2025 09:04
This PR deletes the `createLowerGpuOpsToROCDLOpsPass` constructor from
the .td file, making the `createConvertGpuOpsToROCDLOps` pass available
to users. This has the following effects:

1. `createLowerGpuOpsToROCDLOpsPass` is not available anymore. Instead,
`createConvertGpuOpsToROCDLOps` should be used. This makes the interface
consistent with ConvertGpuOpsToNVVMOps.

2. To call `createConvertGpuOpsToROCDLOps`, the options must be passed
via ConvertGpuOpsToROCDLOpsOptions. This has the side effect of
making the `allowed-dialects` option available, which was not accessible
via C++ before.
Avoid using extends, and adding the high and low half and use
extadd_pairwise instead.
…any order (#157407)

I have seen some flakiness in this test where the 2 checked strings
appear in a different order. Due to buffering of writes, and that one of
these strings is written during the signal handler, I think this is
valid. 

This PR relaxes the test to allow those strings to appear in
either order.
Checking `isOperationLegalOrCustom` instead of `isOperationLegal` allows
more optimization opportunities. In particular, if a target wants to
mark `extract_vector_elt` as `Custom` rather than `Legal` in order to
optimize some certain cases, this combiner would otherwise miss some
improvements.

Previously, using `isOperationLegalOrCustom` was avoided due to the risk
of getting stuck in infinite loops (as noted in
llvm/llvm-project@61ec738).
After testing, the issue no longer reproduces, but the coverage is
limited to the regression/unit tests and the test-suite.
In simplifyBlends, when normalizing a blend recipe, the first mask that
is used only by the blend and is not all-false is chosen, and its
corresponding incoming value becomes the initial value, with the others
blended into it. At the same time, the mask that is chosen can be
eliminated. However, a multi-user mask might be used by a dead recipe,
which prevents this optimization. This patch moves removeDeadRecipes
before simplifyBlends to eliminate dead recipes, allowing simplifyBlends
to remove more dead masks.
Implements the base of the MemoryLegalizer for a roughly correct GFX1250 memory model.
Documentation will come later, and some remaining changes still have to be added, but this is the backbone of the model.
… (#157639)

This reverts commit be17791.

This is not necessary for gfx1250 anymore.
…#156714)

Match `(X * Y) + Z` in `combineAdd`. If target supports and we don't
overflow (ie. we know the top 12 bits are unset), rewrite using
VPMADD52L

Have just done the `L` version for now at least, wanted to get feedback
before continuing
Apply the following changes:

* Ensure all float types are covered (`f16` and `f128` were often
missing)
* Switch to more straightforward test names
* Remove some CHECK directives that are outdated (prefix changed but the
directive did not get removed)
* Add common check prefixes to merge similar blocks
* Test a more similar set of platforms
* Add missing `nounwind`
* Test `strictfp` for each libcall where possible

This is a pre-test for [1].

[1]: llvm/llvm-project#152684
Align the syntax used for the optimization level argument of the
expand-fp pass in textual descriptions of pass pipelines with the syntax
used by other passes taking a similar argument. That is, use e.g.
`expand-fp<O1>` instead of `expand-fp<opt-level=1>`.
`f16` is more functional than just a storage type on the platform,
though it does have some codegen issues [1]. To prepare for future
changes, do the following nonfunctional updates to the existing `half`
test:

* Add tests for passing and returning the type directly.
* Add tests showing bitcast behavior, which is currently incorrect but
serves as a baseline.
* Add tests for `fabs` and `copysign` (trivial operations that shouldn't
require libcalls).
* Add invocations for big-endian and for PPC32.
* Rename the test to `half.ll` to reflect its status, which also matches
other backends.

[1]: llvm/llvm-project#97975
[AMDGPU] Treat GEP offsets as signed in AMDGPUPromoteAlloca

AMDGPUPromoteAlloca can transform i32 GEP offsets that operate on
allocas into i64 extractelement indices. Before this patch, negative GEP
offsets would be zero-extended, leading to wrong extractelement indices
with values around (2**32-1).

This fixes failing LlvmLibcCharacterConverterUTF32To8Test tests for
AMDGPU.
…7702)

The default values for DebugLocs in LoopVectorizer/VPlan were recently
updated from empty DebugLocs to DebugLoc::getUnknown, as part of the
DebugLoc Coverage Tracking work. However, there are some cases where we
also pass an explicit empty DebugLoc, in many cases as a filler
argument. This patch updates all of these to `getUnknown` for now, until
either valid locations or a suitable categorization can be assigned to
each instead.

This change is NFC outside of DebugLoc coverage tracking builds.
`Predicates` and `Features` fields serve the same purpose. They should
be kept in sync, but not all predicates are based on features. This
resulted in introducing dummy features for that only reason.

This patch removes `Features` field and changes TableGen emitters to use
`Predicates` instead.

Historically, predicates were written with the assumption that the
checking code will be used in `SelectionDAGISel` subclasses, meaning
they will have access to the subclass variables, such as `Subtarget`.
There are no such variables in the generated
`GenSubtargetInfo::getHwModeSet()`, so we need to provide them. This can
be achieved by subclassing `HwModePredicateProlog`, see an example in
`Hexagon.td`.
Defaults to "agent" for targets that do not support it.

- Add documentation
- Register it in MachineModuleInfo
- Add MemoryLegalizer support
This test is strange since it's full of decoding failure warnings
Make sure the tested error is the literal error, not
for unaligned registers.
This will add label `clang:temporal-safety` to PRs touching the mentioned files.
Removing statefullness also adds the benefit of short circuiting.
During CSE, we don't have to drop all poison-generating flags on
mis-match, we can keep the ones common on both recipes.

PR: llvm/llvm-project#157664
The test introduced by PR #157408 requires the amdgpu target. Move it to
the subdirectory which only runs if the target is available.
The limit 'dfa-max-num-paths' that is used to control number of
enumerated paths was not checked against inside getPathsFromStateDefMap.
It may lead to large memory consumption for complex enough switch
statements.

Reland llvm/llvm-project#145482
Required for `BUILD_SHARED_LIBS=ON` builds with optimizations disabled
for the new FortranUtils library.

Also see #150027 #155422
There are no references to it anymore in the codebase.
…(#157716)

In PromoteMem2Reg, we perform a DFS over the CFG and track, for each
alloca, its incoming value and its associated incoming DebugLoc, both of
which are taken from stores to that alloca; these values and DebugLocs
are propagated to PHI nodes when new blocks are reached. In the event
that for one incoming edge no store instruction has been seen, we
propagate an UndefValue and an empty DebugLoc to the PHI.

This is a perfectly valid occurrence, and assigning an empty DebugLoc to
the PHI is the correct course of action; therefore, we should pass an
annotated DebugLoc instead, so that in DebugLoc coverage tracking we
correctly do not expect a valid DebugLoc to be present; we generally
mark allocas as having CompilerGenerated locations, so I've chosen to
use the same annotation to represent the uninitialized value of that
alloca.

This change is NFC outside of DebugLoc coverage tracking builds.
…0169)

Supporting Min/Max Operations: `min`, `max`, `umin`, `umax`
This interface isn't used anywhere anymore.
@jsji jsji force-pushed the llvmspirv_pulldown branch from 4bdb74e to c5dedea Compare September 29, 2025 19:32
@jsji jsji force-pushed the llvmspirv_pulldown branch from c5dedea to f125dd2 Compare September 29, 2025 22:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
disable-lint Skip linter check step and proceed with build jobs
Projects
None yet
Development

Successfully merging this pull request may close these issues.